<html>

<head>
<title>Methods used for data analysis</title>

</head>

<body lang=EN-US link=blue vlink=purple>
<table border="0" cellpadding="10" cellspacing="5" style="margin-top:2px;font-size:1.6em;background-color:grey">
<tr>
            <td style="background-color:rgb( 149, 206, 145)"><a href="/part/">Parts List</a></td>
            <td style="background-color:rgb( 149, 206, 145)"><a href="/partlink/">Parts Relationships</a></td> 
            <td style="background-color:rgb( 149, 206, 145)"><a href="/pgroup/">Parts Subgroups</a></td>
            <td style="background-color:rgb( 149, 206, 145)"><a href="/help/">Methods</a></td>
            <td style="background-color:rgb( 149, 206, 145)"><a href="/blast/">Blast search</a></td>
</tr>
</table>

<div class=Section1>

<div>

<p class=MsoNormal style='text-autospace:none'><b><span style='font-size:14.0pt'>Methods
used for data analysis</span></b></p>

<p class=MsoNormal style='text-autospace:none'><b><span style='font-size:11.0pt'>Parts
similarity analysis</span></b></p>

<p class=MsoNormal style='text-align:justify;text-justify:inter-ideograph;
text-indent:14.2pt;text-autospace:none'><span style='font-size:11.0pt'>A
standalone Blast was used to calculate the sequence similarity between the
parts downloaded from the Registry website. We define similarity between two
parts as: </span></p>

<p class=MsoNormal style='text-align:justify;text-justify:inter-ideograph;
text-indent:14.2pt;text-autospace:none'><span lang=DA style='font-size:11.0pt'>S<sub>ij</sub>=n/min(n<sub>i</sub>,
n<sub>j</sub>)</span></p>

<p class=MsoNormal style='text-autospace:none'><span style='font-size:11.0pt'>Where
n is number of matched base pairs, n<sub>i</sub>, n<sub>j</sub> are the
sequence lengths of the two parts. We also define the relative length different
between two parts as:</span></p>

<p class=MsoNormal style='text-align:justify;text-justify:inter-ideograph;
text-indent:14.2pt;text-autospace:none'><span style='font-size:11.0pt'>d<sub>ij</sub>=abs(n<sub>i</sub>-n<sub>j</sub>)
/min(n<sub>i</sub>, n<sub>j</sub>)</span></p>

<p class=MsoNormal style='text-autospace:none'><span style='font-size:11.0pt'>Then
we classify the relationships between parts into the following five types:</span></p>

<p class=MsoNormal style='text-autospace:none'><span style='font-size:11.0pt'>E
(equal): S<sub>ij</sub>=1 and d<sub>ij</sub>=0</span></p>

<p class=MsoNormal style='text-autospace:none'><span style='font-size:11.0pt'>Ex
(extra sequence): S<sub>ij</sub>=1 and d<sub>ij</sub>&lt;0.3</span></p>

<p class=MsoNormal style='text-autospace:none'><span style='font-size:11.0pt'>I
(inside): S<sub>ij</sub>=1 and d<sub>ij</sub>&gt;=0.3</span></p>

<p class=MsoNormal style='text-autospace:none'><span style='font-size:11.0pt'>S
(similar): S<sub>ij</sub>&gt;0.7 and d<sub>ij</sub>&lt;0.3</span></p>

<p class=MsoNormal style='text-autospace:none'><span style='font-size:11.0pt'>Si
(similar inside): S<sub>ij</sub>&gt;0.7 and d<sub>ij</sub>&gt;=0.3</span></p>

<p class=MsoNormal style='text-autospace:none'><span style='font-size:11.0pt'>A
network including total 1447 relationships between 310 parts (66 parts have no
related parts) was obtained. Most of the relationships are similar (963) and
similar inside (388). </span></p>

<p class=MsoNormal style='text-autospace:none'><b><span style='font-size:11.0pt'>Identification
of components and subgroups</span></b></p>

<p class=MsoNormal style='text-align:justify;text-justify:inter-ideograph;
text-indent:14.2pt;text-autospace:none'><span style='font-size:11.0pt'>We
calculated the weakly connected components in the relationship network using
Networkx (a python package for network analysis). The biggest connected
component contains 220 parts (70 percent of the whole network). </span></p>

<p class=MsoNormal style='text-align:justify;text-justify:inter-ideograph;
text-indent:14.2pt;text-autospace:none'><span style='font-size:11.0pt'>To use
this network for the reorganization of parts, we further generalized the five
types of relationships into two categories: General Inside (GI, including I and
Si) and General Similar (GS, including the other three types). We then
extracted a part relationship network including only the GS relationships. 58
connected components including 212 parts were found in this network and
regarded as subgroups because all the parts in a subgroup have similar sequence
and thus should have similar biological functions. </span></p>

<p class=MsoNormal style='text-autospace:none'><b><span style='font-size:11.0pt'>Identification
of redundant similar relationships</span></b></p>

<p class=MsoNormal style='text-align:justify;text-justify:inter-ideograph;
text-indent:14.2pt;text-autospace:none'><span style='font-size:11.0pt'>If the
sequence of part A is similar to that of part B and B is similar to C, then it
is very likely that A is also similar to C. This leads to densely connected
networks difficult for visualization. For nice visualization of the parts
relationships in components and subgroups, we decided to remove the redundant
links in the network. We did this by sorting the parts based on the sequence
lengths and making the links from the short sequence part to the long sequence
part. If lengths A&lt;B&lt;C, then we only kept the A-B and B-C similar
relationships and removed the A-C relationships. </span></p>

<p class=MsoNormal><b><span style='font-size:11.0pt'>Identification of
elementary parts</span></b></p>

<p class=MsoNormal style='text-indent:14.2pt'><span style='font-size:11.0pt'>We
analyzed the directed part relationship network (direction from a short
sequence to a longer one) and identified the 147 parts with zero in-degree as
elementary parts. All the other 163 parts are derived from these elementary
parts by sequence modification, adding extra sequences or parts combination.</span></p>

<p class=MsoNormal style='text-autospace:none'><b><span style='font-size:11.0pt'>Network
visualization and web design</span></b></p>

<p class=MsoNormal style='text-align:justify;text-justify:inter-ideograph;
text-indent:14.2pt;text-autospace:none'><span style='font-size:11.0pt'>A
clickable SVG graph showing the parts relationships is generated automatically
when you click a component or subgroup ID. This was done by pydot using the
Graphviz network visualization software developed by AT &amp; T.</span></p>

<p class=MsoNormal style='text-align:justify;text-justify:inter-ideograph;
text-indent:14.2pt;text-autospace:none'><span style='font-size:11.0pt'>The
website is developed by Django, a powerful yet easy to use web application
framework based on python.</span></p>

<p class=MsoNormal style='text-autospace:none'><b><span style='font-size:11.0pt'>Colour code in the Graph</span></b></p>
<b>Part colours: </b><br/>
<table border=1>
	<tr><td>Regulatory</td><td style="background-color:blue">blue</td></tr>
	<tr><td>RBS</td><td style="background-color:purple">purple</td></tr>
	<tr><td>Enzyme</td><td style="background-color:red">red</td></tr>
	<tr><td>Repressor or Activator</td><td style="background-color:brown">brown</td></tr>
	<tr><td>reporter_cds</td><td style="background-color:Green">Green</td></tr>
	<tr><td>Uncategorized_Coding</td><td style="background-color:white">Black</td></tr>
	<tr><td>terminator</td><td style="background-color:orange">orange</td></tr>
	<tr><td>DNA</td><td style="background-color:pink">pink</td></tr>
	<tr><td>conjugation</td><td style="background-color:cyan">cyan</td></tr>
</table>
<b>Link colours: </b><br/>
<table border=1>
	<tr><td>equal</td><td style="background-color:blue">blue</td></tr>
	<tr><td>similar</td><td style="background-color:yellow">yellow</td></tr>
	<tr><td>extra sequence</td><td style="background-color:red">red</td></tr>
	<tr><td>inside</td><td style="background-color:brown">brown</td></tr>
	<tr><td>similar inside</td><td style="background-color:Green">Green</td></tr>
</table>

<p class=MsoNormal style='text-align:justify;text-justify:inter-ideograph;
text-indent:14.2pt;text-autospace:none'><span style='font-size:11.0pt'>&nbsp;</span></p>

<p class=MsoNormal style='text-align:justify;text-justify:inter-ideograph;
text-autospace:none'><b><span style='font-size:11.0pt'>Any comments please send
to <a href="mailto:hw.ma@ed.ac.uk?subject=about%20parts%20similarity%20analysis">Hongwu
Ma</a>.</span></b></p>

<p class=MsoNormal><b><span style='font-size:11.0pt'>&nbsp;</span></b></p>

</div>

</div>

</body>

</html>
